480 research outputs found

    Winnow based identification of potent hERG inhibitors in silico: comparative assessment on different datasets

    Get PDF
    RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Peer Reviewe

    Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Datasets

    Get PDF
    The ability to interpret the predictions made by quantitative structure activity relationships (QSARs) offers a number of advantages. Whilst QSARs built using non-linear modelling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modelling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting non-linear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to two widely used linear modelling approaches: linear Support Vector Machines (SVM), or Support Vector Regression (SVR), and Partial Least Squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions, using novel scoring schemes for assessing Heat Map images of substructural contributions. We critically assess different approaches to interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed, public domain benchmark datasets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modelling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpreting non-linear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using Open Source programs, which we have made available to the community. These programs are the rfFC package [https://r-forge.r-project.org/R/?group_id=1725] for the R Statistical Programming Language, along with a Python program HeatMapWrapper [https://doi.org/10.5281/zenodo.495163] for Heat Map generation

    Comparing the CORAL and random forest approaches for modelling the in vitro cytotoxicity of silica nanomaterials

    Get PDF
    Nanotechnology is one of the most important technological developments of the twenty-first century. In silico methods such as quantitative structure-activity relationships (QSARs) to predict toxicity promote the safe-by-design approach for the development of new materials, including nanomaterials. In this study, a set of cytotoxicity experimental data corresponding to 19 data points for silica nanomaterials was investigated to compare the widely employed CORAL and Random Forest approaches in terms of their usefulness for developing so-called “nano-QSAR” models. “External” leave-one-out cross-validation (LOO) analysis was performed to validate the two different approaches. An analysis of variable importance measures and signed feature contributions for both algorithms was undertaken in order to interpret the models developed. CORAL showed a more pronounced difference between the average coefficient of determination (R2) between training and LOO (0.83 and 0.65 for training and LOO respectively) compared to Random Forest (0.87 and 0.78 without bootstrap sampling, 0.90 and 0.78 with bootstrap sampling), which may be due to overfitting. The aspect ratio and zeta potential from amongst the nanomaterials’ physico-chemical properties were found to be the two most important variables for the Random Forest and the average feature contributions calculated for the corresponding descriptors were consistent with the clear trends observed in the dataset: less negative zeta potential values and lower aspect ratio values were associated with higher cytotoxicity. In contrast, CORAL failed to capture these trends

    Comparing the CORAL and Random Forest approaches for modelling the in vitro cytotoxicity of silica nanomaterials.

    Get PDF
    Nanotechnology is one of the most important technological developments of the 21st century. In silico methods to predict toxicity, such as quantitative structure-activity relationships (QSARs), promote the safe-by-design approach for the development of new materials, including nanomaterials. In this study, a set of cytotoxicity experimental data corresponding to 19 data points for silica nanomaterials were investigated, to compare the widely employed CORAL and Random Forest approaches in terms of their usefulness for developing so-called 'nano-QSAR' models. 'External' leave-one-out cross-validation (LOO) analysis was performed, to validate the two different approaches. An analysis of variable importance measures and signed feature contributions for both algorithms was undertaken, in order to interpret the models developed. CORAL showed a more pronounced difference between the average coefficient of determination (RÂČ) for training and for LOO (0.83 and 0.65 for training and LOO, respectively), compared to Random Forest (0.87 and 0.78 without bootstrap sampling, 0.90 and 0.78 with bootstrap sampling), which may be due to overfitting. With regard to the physicochemical properties of the nanomaterials, the aspect ratio and zeta potential were found to be the two most important variables for Random Forest, and the average feature contributions calculated for the corresponding descriptors were consistent with the clear trends observed in the data set: less negative zeta potential values and lower aspect ratio values were associated with higher cytotoxicity. In contrast, CORAL failed to capture these trends

    Evaluation of Force-Field Calculations of Lattice Energies on a Large Public Dataset, Assessment of Pharmaceutical Relevance, and Comparison to Density Functional Theory

    Get PDF
    Crystal lattice energy is a key property affecting the ease of processing pharmaceutical materials during manufacturing, as well as product performance. We present an extensive comparison of 324 force-field protocols for calculating the lattice energies of single component, organic molecular crystals (further restricted to Zâ€Č less than or equal to one), corresponding to a wide variety of force-fields (DREIDING, Universal, CVFF, PCFF, COMPASS, COMPASSII), optimization routines, and other variations, which could be implemented as part of an automated workflow using the industry standard Materials Studio software. All calculations were validated using a large new dataset (SUB-BIG), which we make publicly available. This dataset comprises public domain sublimation data, from which estimated experimental lattice energies were derived, linked to 235 molecular crystals. Analysis of pharmaceutical relevance was performed according to two distinct methods based upon (A) public and (B) proprietary data. These identified overlapping subsets of SUB-BIG comprising (A) 172 and (B) 63 crystals, of putative pharmaceutical relevance, respectively. We recommend a protocol based on the COMPASSII force field for lattice energy calculations of general organic or pharmaceutically relevant molecular crystals. This protocol was the most highly ranked prior to subsetting and was either the top ranking or amongst the top 15 protocols (top 5%) following subsetting of the dataset according to putative pharmaceutical relevance. Further analysis identified scenarios where the lattice energies calculated using the recommended force-field protocol should either be disregarded (values greater than or equal to zero and/or the messages generated by the automated workflow indicate extraneous atoms were added to the unit cell) or treated cautiously (values less than or equal to −249 kJ/mol), as they are likely to be inaccurate. Application of the recommended force-field protocol, coupled with these heuristic filtering criteria, achieved an root mean-squared error (RMSE) around 17 kJ/mol (mean absolute deviation (MAD) around 11 kJ/mol, Spearman’s rank correlation coefficient of 0.88) across all 226 SUB-BIG structures retained after removing calculation failures and applying the filtering criteria. Across these 226 structures, the estimated experimental lattice energies ranged from −60 to −269 kJ/mol, with a standard deviation around 29 kJ/mol. The performance of the recommended protocol on pharmaceutically relevant crystals could be somewhat reduced, with an RMSE around 20 kJ/mol (MAD around 13 kJ/mol, Spearman’s rank correlation coefficient of 0.76) obtained on 62 structures retained following filtering according to pharmaceutical relevance method B, for which the distribution of experimental values was similar. For a diverse set of 17 SUB-BIG entries, deemed pharmaceutically relevant according to method B, this recommended force-field protocol was compared to dispersion corrected density functional theory (DFT) calculations (PBE + TS). These calculations suggest that the recommended force-field protocol (RMSE around 15 kJ/mol) outperforms PBE + TS (RMSE around 37 kJ/mol), although it may not outperform more sophisticated DFT protocols and future studies should investigate this. Finally, further work is required to compare our recommended protocol to other lattice energy calculation protocols reported in the literature, as comparisons based upon previously reported smaller datasets indicated this protocol was outperformed by a number of other methods. The SUB-BIG dataset provides a basis for these future studies and could support protocol refinement

    Molecular Fingerprint-Derived Similarity Measures for Toxicological Read-Across: Recommendations for Optimal Use

    Get PDF
    Computational approaches are increasingly used to predict toxicity, in part due to pressures to find alternatives to animal testing. Read-across is the “new paradigm” which aims to predict toxicity by identifying similar, data rich, source compounds. This assumes that similar molecules tend to exhibit similar activities, i.e. molecular similarity is integral to read-across. Various molecular fingerprints and similarity measures may be used to calculate molecular similarity. This study investigated the value and concordance of the Tanimoto similarity values calculated using six widely used fingerprints within six toxicological datasets. There was considerable variability in the similarity values calculated from the various molecular fingerprints for diverse compounds, although they were reasonably concordant for homologous series acting via a common mechanism. The results suggest generic fingerprint-derived similarities are likely to be optimally predictive for local datasets, i.e. following sub-categorisation. Thus, for read-across, generic fingerprint-derived similarities are likely to be most predictive after chemicals are placed into categories (or groups), then similarity is calculated within those categories, rather than for a whole chemically diverse dataset

    Jet energy measurement with the ATLAS detector in proton-proton collisions at root s=7 TeV

    Get PDF
    The jet energy scale and its systematic uncertainty are determined for jets measured with the ATLAS detector at the LHC in proton-proton collision data at a centre-of-mass energy of √s = 7TeV corresponding to an integrated luminosity of 38 pb-1. Jets are reconstructed with the anti-kt algorithm with distance parameters R=0. 4 or R=0. 6. Jet energy and angle corrections are determined from Monte Carlo simulations to calibrate jets with transverse momenta pT≄20 GeV and pseudorapidities {pipe}η{pipe}<4. 5. The jet energy systematic uncertainty is estimated using the single isolated hadron response measured in situ and in test-beams, exploiting the transverse momentum balance between central and forward jets in events with dijet topologies and studying systematic variations in Monte Carlo simulations. The jet energy uncertainty is less than 2. 5 % in the central calorimeter region ({pipe}η{pipe}<0. 8) for jets with 60≀pT<800 GeV, and is maximally 14 % for pT<30 GeV in the most forward region 3. 2≀{pipe}η{pipe}<4. 5. The jet energy is validated for jet transverse momenta up to 1 TeV to the level of a few percent using several in situ techniques by comparing a well-known reference such as the recoiling photon pT, the sum of the transverse momenta of tracks associated to the jet, or a system of low-pT jets recoiling against a high-pT jet. More sophisticated jet calibration schemes are presented based on calorimeter cell energy density weighting or hadronic properties of jets, aiming for an improved jet energy resolution and a reduced flavour dependence of the jet response. The systematic uncertainty of the jet energy determined from a combination of in situ techniques is consistent with the one derived from single hadron response measurements over a wide kinematic range. The nominal corrections and uncertainties are derived for isolated jets in an inclusive sample of high-pT jets. Special cases such as event topologies with close-by jets, or selections of samples with an enhanced content of jets originating from light quarks, heavy quarks or gluons are also discussed and the corresponding uncertainties are determined. © 2013 CERN for the benefit of the ATLAS collaboration

    Measurement of the inclusive and dijet cross-sections of b-jets in pp collisions at sqrt(s) = 7 TeV with the ATLAS detector

    Get PDF
    The inclusive and dijet production cross-sections have been measured for jets containing b-hadrons (b-jets) in proton-proton collisions at a centre-of-mass energy of sqrt(s) = 7 TeV, using the ATLAS detector at the LHC. The measurements use data corresponding to an integrated luminosity of 34 pb^-1. The b-jets are identified using either a lifetime-based method, where secondary decay vertices of b-hadrons in jets are reconstructed using information from the tracking detectors, or a muon-based method where the presence of a muon is used to identify semileptonic decays of b-hadrons inside jets. The inclusive b-jet cross-section is measured as a function of transverse momentum in the range 20 < pT < 400 GeV and rapidity in the range |y| < 2.1. The bbbar-dijet cross-section is measured as a function of the dijet invariant mass in the range 110 < m_jj < 760 GeV, the azimuthal angle difference between the two jets and the angular variable chi in two dijet mass regions. The results are compared with next-to-leading-order QCD predictions. Good agreement is observed between the measured cross-sections and the predictions obtained using POWHEG + Pythia. MC@NLO + Herwig shows good agreement with the measured bbbar-dijet cross-section. However, it does not reproduce the measured inclusive cross-section well, particularly for central b-jets with large transverse momenta.Comment: 10 pages plus author list (21 pages total), 8 figures, 1 table, final version published in European Physical Journal

    Observation of associated near-side and away-side long-range correlations in √sNN=5.02  TeV proton-lead collisions with the ATLAS detector

    Get PDF
    Two-particle correlations in relative azimuthal angle (Δϕ) and pseudorapidity (Δη) are measured in √sNN=5.02  TeV p+Pb collisions using the ATLAS detector at the LHC. The measurements are performed using approximately 1  Όb-1 of data as a function of transverse momentum (pT) and the transverse energy (ÎŁETPb) summed over 3.1<η<4.9 in the direction of the Pb beam. The correlation function, constructed from charged particles, exhibits a long-range (2<|Δη|<5) “near-side” (Δϕ∌0) correlation that grows rapidly with increasing ÎŁETPb. A long-range “away-side” (Δϕ∌π) correlation, obtained by subtracting the expected contributions from recoiling dijets and other sources estimated using events with small ÎŁETPb, is found to match the near-side correlation in magnitude, shape (in Δη and Δϕ) and ÎŁETPb dependence. The resultant Δϕ correlation is approximately symmetric about π/2, and is consistent with a dominant cos⁥2Δϕ modulation for all ÎŁETPb ranges and particle pT
    • 

    corecore